United States Patent m
Ichige et aL
[54] IMAGE SIGNAL ENCODING AND
COMMUNICATING APPARATUS USING
MEANS FOR EXTRACTING PARTICULAR
PORTIONS OF AN OBJECT IMAGE
[75] Inventors: Kenji Ichig€, Chigasaki; Takuya
Imaide, Fujisawa; Ryuji Nishimura;
Norio Yatsuda, both of Yokohama;
ffiroyuM Kuriyama, Tokyo; Mayuko
Oda, Kawasaki, all of Japan
[73] Assignee: Hitachi, Ltd., Tokyo, Japan
[21] AppL No.: 418,688
[22] Filed: Apr. 7, 1995
PO] Foreign Application Priority Data
Apr. 15, 1994 [JP] Japan 6-076863
Jul. 7, 1994 [JP] Japan 6-155691
[51] Int CI 6 ~ H04M 11/00; H04N 7/14
[52] U.S- O. 348/14; 348/17; 348/51;
379/96
[58] Field of Search 379/53, 54, 90,
379/93, 96, 97, 98; 348/13, 14, 17, 51
[56] References Cited
U.S. PATENT DOCUMENTS
5,073,927 12/1991 Grube « 348/14
5,426.460 6/1995 Erving et aL 348/14
FOREIGN PATENT DOCUMENTS
0566188 10/1993 European PaL OS. 348/17
OTHER PUBLICATIONS
A. N. Netravali, B.G. Haskell: "Digital picture" pp.
115-119, AT&T Bell Lab. (1988).
JP-A-57-129076.
US00571059GA
[ii] Patent Number: 5,710,590
[45] Date of Patent: Jan. 20, 1998
"Journal of Institute of Television Engineers of Japan", voL
45, No. 7 (1991), pp. 793-799.
JP-A-62-120179.
"Systematic Image Encoding", Makoto Miyahara pp. 1-14,
IBC
JP-A-59-208983.
JP-A-4-205070.
ITU-T Recommendation H261, Video codes for audiovisial
Series at px 64kbits (1993).
W. F. SchreiberjTundamentals of electronic imaging
systm", Springer-Verlag (1993) p. 106.
**Encoding technology for television telephone and televi-
sion conference" p. 793, Journal of Institute of Television
Engineers of Japan, vol 47 (1991), No. 7.
JP-A-5-27346.
JP-A-3-22753.
JP-A-6-225328.
"Intelligent image processing" chapter 8 pp. 132-139,
Shokodo (1994), Agiri and NagahashL
JP-A-5-22753.
Primary Examiner— Curtis Kurtz
Assistant Examiner—Stephen W. Palan
Attorney Agent, or Firm— Antonelli, Tory, Stout & Kraus,
LLP.
[57] ABSTRACT
A picture communication apparatus includes an extracting
circuit for extracting video data of at least one portion from
video data inputted thereto, an encoder for respectively
encoding the extracted video data and the remaining video
data, and a multiplexer for multiplexing the encoded video
data. When encoding the video data, predetermined amount
of codes are allocated to the respective video data. This
suppresses deterioration in the picture quality when con-
ducting an image communication via a transmission path
having a low transmission rate. The transmitted video is
displayed on a contoured display.
3 Claims, 10 Drawing Sheets
MEMORY
120
A
121
U
-180a
,125
EXTRAC-
TION
UNIT
DISPLACE-
Hment
evaluator
CODER
126
M81
CONTROL
UNIT
122
U
,132
SYNTHESIZE
UNIT
,180b
MEMORY-
•182 12 8 129
MULTI-
PLEXER
130
DECODER
DEMULTI-
PLEXER
124
07/26/2002, EAST Version: 1.03.0002
U.S. Patent
20
21
22
i
Jan. 20, 1998 Sheet 1 of 10
FIG.1
^ ^ 27a
25a
24.
27b
CODER
EXTRAC-
TION
UNIT
25b
CODER
.26
.40
SYNTHESIZE
UNIT
CONTROL
DATA
UNIT
BASE
^ ^32
^ x31a
DECODER
DECODER
'31b
28
MULTI-
PLEXER
.30
DEMULTI-
PLEXER
5,710,590
Z
ID
>
HI
o
LU
o
CO
CO
5
CO
<
,29
23
FIG.2
DISPLAY
CODEC
NETWORK
07/26/2002, EAST Version: 1.03.0002
U.S. Patent
Jan. 20, 1998
Sheet 2 of 10
5,710,590
FIG.3
41-
40
L
42s
MEMORY
C>-43
DECISION
UNIT
44-
45-
ADDRESS
GENERATOR
46
I
47
MEMORY
FIG.4
INPUT
IMAGE
EXTRACTED EXTRACTED
PORTION 1 PORTION 2 EXTRACTED
PORTION
EXTRACTED
►CODE A
EXTRACTED
PORTION 3
NON-
EXTRACTED
PORTION
CODE B
07/26/2002, EAST Version: 1.03.0002
U.S. Patent
Jan. 20, 1998
Sheet 3 of 10
5,710,590
FIG.5
CODE A
CODE B,
CODE A
CODE B 2
-FRAME-
-TIME
( CODE B, + CODE B 2 + + CODE B m = CODE B )
07/26/2002, EAST Version: 1.03.0002
U.S. Patent
Jan. 20, 1998
Sheet 4 of 10
5,710,590
120
i
121
122
123
i
125
FIG.7
_^127
EXTRAC-
TION
UNIT
CONTROL
UNIT
CODER
126
132
SYNTHESIZE
UNIT
131
DECODER
128
MULTI-
PLEXER
130
DEMULTI-
PLEXER
ui
>
UJ
o
HI
a.
O
CO
CO
2
CO
z
<
129
124
FIG.8
MEMORY MAP
ENTIRE FACE
EYES
MOUTH
NOSE
EYEBROWS
07/26/2002, EAST Version: 1.03.0002
U.S. Patent
Jan. 20, 1998
Sheet 5 of 10
5,710,590
AC
UJ
Q
121
X
CO
Ui
f
O
CO
o
m
Ui
UJ
FIG.9
Ui
CO
o
2
UJ
a
x
CO
IU
>
Ui
X
o
2
CO
o
QC
CD
Ui
Ui
CO
o
z
UJ T"
E uj
UJ U.
Ui CM
£ ui
UI u.
(— N/N)
FIG.10
140
141 143
DISPLAY
PROJECTION
UNIT
MEMORY
142
07/26/2002, EAST Version: 1.03.0002
U.S. Patent Jan. 20, 1998 Sheet 6 of 10 5,710,590
FIG.11A
MODEL OF MOUTH
UPPER
TEETH
LOWER
TEETH
UPPER
UP
INTERIOR OF
MOUTH
FIG.11B
BASIC FORM
UPPER
LOWER
LIP
FIG.11C
INFORMATION OF VARIATION
INTERIOR OF
MOUTH
<l VERTICAL
"OPENING
-HORIZONTAL— H
OPENING
07/26/2002, EAST Version: 1.03.0002
U.S. Patent
Jan. 20, 1998 Sheet 7 of 10
5,710,590
FIG.12
MEMORY
120
180a
.125
EXTRAC-
TION H
UNIT
121
DISPLACE-
MENT l-H
EVALUATOR
CODER
182 12 8
129
MULTI-
PLEXER
126
181
CONTROL
UNIT
122
132
183
130
SYNTHESIZE
UNIT
180b
MEMORY ■
DECODER
DEMULTI-
PLEXER
FIG.13
MEMORY MAP
ENTIRE FACE
EYES
MOUTH
NOSE
EYEBROWS
UJ
>
LU
o
UJ
cr
CO
CO
CO
z
<
cr
124
07/26/2002, EAST Version: 1.03.0002
U.S. Patent Jan. 20, 1998 Sheet 8 of 10 5,710
■FIG.14
[—FRAME — |
BASIC DISPLACEMENT
INFORMATION INFORMAITON
: m FRAME-
FIG.15A
FIG.15B
FEATURES TO BE COMPUTED
WIDTH, COORDINATES
HEIGHT OF CENTER SIZE GRADIENT COLOR
(W,H) OF MASS
CONSTITUENT ELEMENTS
07/26/2002,
EAST Version:
1.03.0002
U.S. Patent
Jan. 20, 1998
Sheet 9 of 10
5,710,590
FIG.16
\ PART
\no.
type\
NO. \
HAIR
0
FACE
1
RIGHT
EYE
2
LEFT
EYE
3
RIGHT
EYEBROW
4
LEFT
EYEBROW
5
MOUTH
6
—
0
— M
fl
o
^* —
<■£
1
0
9
0
2
0
!
ELEMEN
IT NO.
I
!
i
FIG.17
FRAME FRAME
DEMARCATION DESCRIPTION DEMARCATION DESCRIPTION
CODE OF ELEMENTS CODE v OF ELEMENTS
FRAME n FRAME n+1
ELEMENT NO. . COLOR(r-y.b-y). POSITION(x.y). SIZE)
07/26/2002, EAST Version: 1.03.0002
U.S. Patent Jan. 20, 1998 Sheet 10 of 10 5,710,
FIG.18
FIG.19A FIG.19B
MODEL IMAGE
TRANSMITTED IMAGE
07/26/2002, EAST Version: 1.03.0002
5,710,590
1 2
IMAGE SIGNAL ENCODING AND Moreover, in a videophone apparatus and a video confer-
COMMUNICATING APPARATUS USING ence system, a video signal produced by an imaging appa-
MEANS FOR EXTRACTING PARTICULAR ratus is encoded to be transmitted via a transmission path
PORTIONS OF AN OBJECT IMAGE such as an integrated services digital network (ISDN). For
t* AmrrunirNn hp thr TNVFNTTON 5 sample, for component national television system commit-
B ACKGROUND OF THE INVENTION tee (NTSQ signals, when the signals are not compressed in
The present invention relates to an image encoding and transmission data, the transfer speed in terms of bits is 216
picture communication apparatus, for example a video megabits per second (Mbps) according to the studio stan-
phone and an image recording apparatus. dards of color television. This leads to a requirement that the
Conventionally, as communication apparatuses to com- i 0 signals are to be more efficiently encoded to reduce the
Tniinirate voices and tones, there has been used a video n umb er of bits of transmission data. As the encoding
telephone facility (A. N. Netravali, B. G. Haskell, "Digital method, there has been primarily employed a method
Picture", pp. 115-1 19, AT&T Bell Lab. (1988)). The appa- described in pages 793 to 799 of the "Journal of Institute of
ratus includes a sending facility including an imaging Television Engineers of Japan* 1 , VoL 45, No. 7 (1991).
apparatus, a voice input device, and an encoder circuit for 15 Namely, mere is basically used a conditional pixel supple-
encoding images and voices; a receiving facility including a m rating method on the basis of inter-frame estimation or
decoder for decoding signals of images and voices, a display forecasting in which only mobile portions are transmitted
including a speaker and CRT; and a conamunication con- suc jj that other encoding methods such as a discrete cosine
troller for cominiinicating images and voices via a network. transform (DCT) are additionally used in combination with
In such conventional apparatuses, the contents of an image 20 the conditional pixel supplementing method. Thanks to
produced by a camera are entirely encoded and transmitted development of such a highly efficient encoding technology,
via a transmission line, which leads to necessity of trans- videophones and video conference systems using ISDN
mitring a large amount of data. Consequently, a low-priced lines have been widely introduced to practical uses in
videophone of a type conducting communication via a business and industrial fields. A communication method in
low-speed analog communication line has been attended 25 which an image is transformed into codes for transmission
with a problem in which the picture quality is considerably thereof has been described, for example, in the JP-A-62-
deteriorated or motion of pictures becomes uncomfortable 120179 and in "Systematic Image Encoding" written by
and unnatural Makoto Miyahara in pages 1 to 14 of the IPC.
Various attempts have been made to cope with the prob- Although there have been known low-cost comniunica-
lem above. Far example, according to a videophone appa- 30 t j on systems such as a videophone using analog
ratus described in the JP-A-57- 129076, a background image transmission, to carry put transmission at a low transmission
beforehand prepared is compared with a video currently rate, the picture quality has been sacrificed to a considerable
being produced so as to accordingly clear the background, extent This consequently leads to the following problems,
thereby achieving security control and nunirnizing the Expression appearing particularly in a human face cannot be
amount of image information to be transmitted. 35 satisfactorily transmuted or displayed and variations in the
However, when users of the system conduct communica- expression cannot be communicated in a realtime fashion,
tion while viewing images of each other, the images of the resulting in an unnatural morion of the face,
persons are most important in ordinary cases. Namely, Another object of the present invention is to provide a
background images of the respective persons are less impor- videophone system capable of producing a high-quality
tant in many cases. In consideration of effective allocation of 40 vide0 ^ a realtime manner even through such a
the limited amount of codes, it can be consequently regarded transmission line having a low transmission rate as an
as inefficient to uniformly encode the constituent elements ^ ^ thereby solving the above problem.
of an image in an <^^^^^ n ^^. To achieve the above object according to the present
assign the same quantity of codes to objects having different J ^ a
v^es of szgruficance to &e commumcatmg users 45 ^ ^ of related to S a a
^SS^** vtteophone apparato tecn^ in the ^deo camera mcluding extraction processing means for
JP-A-57-r29<r76req^^ extracting ^ subject ^ot by mc camera and computing
ground image in advance. Namely, conaderahon has not feamres feereof ^ mcans {ar mzlynng the
been given to operabihty and usability of the users. fcaturcs from mc extraction processing means and convert-
SUMMARY OF THE INVENTION ing the features into description of knowledge corresponding
nistherefbreanobjectof the present invention to F ovide t» *c database, interface means for converting the descrip-
a picture conmiiinication apparatus which can be used even *>n of knowledge generated by the encoding means into
though a low-sptxrftransrnission path such as analog phone- signals cxmfonning to a signal system of a signal transmis-
line while retaining a satisfactory quality of picture. 55 sion path and transmitting the knowledge description to a
To achieve the object according to the present invention, r^ver and converting a signaksent from a sender into
there are arranged image extraction means for extracting Ascription of knowledg^and decodu^ means for compos-
images of r^ilar portions of an object, coding means for ^al according to the ^owledge description
coding the «tracted^age portions and means for commu- * e ***** ade ^ reference to the database,
nicating a partner with image data. 60 T^ constituent means above operate to achieve the object
The image extraction means extracts images of particular 88 follows,
portions of a subject Each of the extracted images is When the sender transmits an image, the video camera
encoded in an encoding method or encoded by changing including, in addition to the extraction processing means and
encoding parameters to produce a quantity of codes accord- encoding means, a signal processing circuit and a control
ing to significance of the pertinent image portion. This 65 circuit commonly used for digital video cameras conducts
optimally distributes codes to the respective portions of a signal processing known in the processing of video signals
screen image. produced by an imaging apparatus to resultantry generate
07/26/2002, EAST Version: 1.03.0002
5,7:
3
such signals of the image as video signals. The extraction
processing means extracts me subject from the video signal
generated by the signal processing circuit to compute such
features of elements of the extracted object as the size,
contour, color, coordinates of center of mass, and gradient
The encoding means including a micro computer or the like
analyzes information of the features from the extraction
processing means, recognizes elements constituting the
object and states thereof, and transforms the recognized
information items into knowledge description corresponding
to the database including knowledge of models related to the
subject The interface means transforms the knowledge
description generated by the encoding means into signals
conforming to a signal system of the transmission path and
transmits the resultant signal through the transmission path.
The signal received via the transmission path is converted
by the interface means into knowledge description. Hie
decoding means decodes the knowledge description to
reconstruct the transmitted image. In this operation, the
decoding means accesses the database keeping therein a
large number of images of models related to the object and
then selects therefrom video data items associated with the
elements constituting the image sent from the sender so as
to restore the original image.
That is, the sender does not transmit the image itself. The
image of an object such as a human face to be transmitted
is beforehand transformed into knowledge description rep-
resenting the image such that the knowledge description is
sent as transmission data. In the receiver, the knowledge
description is decoded into the image of the subject as
above.
With this provision, the amount of transmission data is
remarkably minimized and hence it is possible to construct
a videophone system capable of communicating high-
quality pictures in a realtime manner even through such a
communication line having a low transmission rate as an
analog telephone line.
BRIEF DESCRIPTION OF THE DRAWINGS
These and other objects and advantages of the present
invention will become apparent by reference to the follow-
ing description and accompanying drawings wherein:
FIG. 1 is a diagram showing a first embodiment of an
image communicating apparatus according to the present
invention;
FIG. 2 is a diagram showing the overall configuration of
an image communicating apparatus according to the present
invention;
FIG. 3 is a diagram showing constitution of an image
extracting circuit of the first embodiment;
FIG. 4 is a diagram for explaining an encoding method of
the first embodiment;
FIG. 5 is a diagram for explaining the encoding method
of the first embodiment;
FIG. 6 is a diagram showing a second ernbodiment
according to the present invention;
FIG. 7 is a diagram showing structure of encoding means
of the second embodiment;
FIG. 8 is a diagram showing a memory map of video data
in a storage;
FIG. 9 is a diagram for explaining an encoding method of
the second embodiment;
FIG. 10 is a diagram showing constitution of a display of
the second embodiment;
L0,590
4
FIGS. 11A to 11C are diagrams for explaining an image
model of human mouth;
FIG. 12 is a diagram showing a third ernbodiment accord-
ing to the present invention;
5 FIG. 13 is a diagram showing a memory map of video
data in a memory;
FIG. 14 is a diagram showing the encoding method of the
third embodiment;
io FIGS. 15A and 15B are diagrams for explaining an
example of the method of converting an image into knowl-
edge description;
FIG. 16 is a diagram showing an example of the contents
of a database;
13 FIG. 17 is a diagram showing an example of knowledge
description;
FIG. 18 is a diagram showing an image on the receiver
side immediately after the communication line is established
between the sender and the receiver, and
20 FIGS. 19A and 19B are diagrams for explaining a method
of receiving an image.
DESCRIPTION OF THE PREFERRED
EMBODIMENTS
25
Next, description will be given of an embodiment of an
picture communication apparatus according to the present
invention.
FIG. 2 shows structure of a picture communication appa-
30 ratus such as a videophone facility including a user 1
conducting communication via the apparatus, a video input
apparatus 2, voice input device (microphone) 3, a display
device 4, codec 5, and a communication network 6.
Hie user 1 of the communicating apparatus communicates
35 via a communication network with a partner using a similar
communicating apparatus at a remote place. The imaging
device 2 shoots an image of the user 1 and then inputs a
video signal of the image to the encoder 5. The microphone
3 transforms the voice of the user 1 into a voice signal to be
40 fed to the encoder 5. The encoder 5 encodes the video and
voice signals into a signal of code (communication signal)
conforming to the network 6 and then supplies the signal to
the network 6. In addition to transmission of the communi-
cation signal from (he user 1 to the network 6, the decoder
43 6 conducts reception of a communication signal sent from
the communicating partner 1 via the network 6 and then
decodes the signal to restore the video and voice signals of
the partner L The resultant video and voice signals of the
communicating partner 1 are fed to the display 4 to be
so presented as an image and a sound.
FIG. 1 shows an example of structure of the encoding
device 5 of FIG. 2. A including an input terminal 20, output
terminals 21, 22, an input and output terminal 23, an input
signal before an extracting process 25a, an input signal after
55 the extracting process 256, encoding circuits 27a, 276, a
multiplexer 28, a transmission 1 receive unit 29, demulti-
plexer 30, decorder 31a, 31fr, and a signal synthesize unit 32.
According to the present invention, voices are processed in
the ordinary known method and hence description thereof
GO will be avoided. The video signal of the user 1 produced
from the imaging apparatus and microphone is received via
the input terminal 20. The signal is encoded by encoding
means on the sender side including the extracting circuit 24,
encoder circuits 27a and 27b, and multiplexer 28. The
65 encoded signal is transformed by the sending and receiving
section 29 into a communication signal to be outputted via
the input and output terminal 15 to the network. The sending
07/26/2002, EAST Version: 1.03.0002
5,710,590
5 6
and receiving section 14 conducts the transmission and which these regions are overlapped with each other, thereby
reception at die same time and receives via the input and finally extracting as the region of the object an area sur-
output terminal 23 a communication signal containing an rounding the overlapped area-
image and a voice from the communicating partner. The According to the data items from the extraction circuit 24,
signal is decoded by decoding means including the separat- 5 the control circuit 26 delivers a control signal from the
ing circuit 30, decoding circuits 31c and 31fc, and compos- output terminal 27 to regulate the direction or orientation
ing circuit 32 to restore the image signal of the partner. The and ratio of magnification of the imaging apparatus 2. As a
image signal is then delivered from the output terminal 22. result, the imaging apparatus 2 is desirably and automati-
The video signal is sent to the display 4 to be represented as cally oriented to the user 1 to shoot an image having an
the image of the partner. Although not shown, when the l0 appropriate size. The processing procedure is executed as
image of the user 1 is to be displayed on the display 4 for necessary so that the imaging apparatus 2 automatically
confirmation, it is only necessary mat a change-over opera- follows movement of the commimicating person in front
tion is conducted in the sending and receiving section 29 to thereof. To adjust the orientation and magnification ratio of
treat the transmission signal as a reception signal. the imaging apparatus 2, the apparatus 2 may be mechani-
Alternatively, the input video signal need only be supplied l5 cally or electronically operated. In an imaging apparatus
to the composing circuit 32 to be mixed with a received including imaging devices such as charge-coupled devices,
image so as to compose an image to be presented on the the electronic adjustment of orientation above can be
display 4. achieved by using CCDs including marginal pixels which
On receiving the signal from the extracting circuit 24, the are other than those used for the output of the imaging
control circuit 26 sends a control signal to the imaging 20 apparatus. In addition, the magnification ratio can be elec-
apparatus to obtain an optimal input image. The input image tronically conducted by an operation generally called elcc-
signal is first fed to the extracting circuit 24 to extract partial tronic zooming. In the encoding circuit 27b, for the decoding
images of the object In this embodiment, the shooting operation to be achieved later, the size and position of the
object is the user of the apparatus. The partial images include extracted image are encoded together with the extracted
the eyes, mouth, etc of the user. Since the images of the eyes 2s image
and mouth vary in contour thereof more frequently than In operation on the receiver side, the received signals are
those of the other elements of the object, it is necessary to separated by the separating circuit 30 into codes of extracted
allocate a larger quantity of information items thereto. The portions and those of the other portions. The separated codes
extracted partial images (extraction signal 252?) and other are respectively decoded by the decoders 21a and 31b
partial images (non-extraction signal 25a) are inputted 30 corresponding to the encoders I7a and Z7b y respectively,
respectively to the encoders 27a and 21b for the encoding There are resultandy attained images of the extracted por-
thereof. Although the encoding method is not limited, to tions and images of the non-extracted portions. These
restore a picture of a higher quality for the extracted partial images are fed to the composing circuit 32 to produce an
images, a greater number of codes are generated from the image according to information items of the sizes and
encoder circuit 21b. For the encoder circuit 27a, there may 35 positions of the extracted images,
be utilized any encoding methods ordinarily used for video- FIG. 3 shows an example of constitution of the extracting
phones (reference is made to ITU-T Recommendation H. circuit 24 of FIG. 1. The extracting method is basically
261, Video codes for audiovisual services at px64 kbits identical to that described in the JP-A-4-205070. In the
(1993) and to "encoding technology for videophone and configuration^ reference numerals 40 and 43 designate input
television conference" written in page 793 of the Journal of 40 terminals, a numeral 41 denotes memory means including
Institute of Television Engineers of Japan, VoL 47 (1991), one-bit data for each input pixel and keeping therein results
No. 7). The encoder 27b may be operated according to the of decision for extraction areas, a numeral 45 indicates
encoding method such as an entropy encoding method memory means, a numeral 42 stands for a decision circuit,
(reference is made to page 106 of "Fundamentals of Elec- a numeral 44 represents an address generating circuit, and
tronic Imaging System" written by W. F. Schreiber and 45 numerals 46 and 47 designates output terminals. A video
published from Springer-Verlag in 1993). signal is fed via the input terminal 40 to the decision circuit
The multiplexer 28 multiplexes signals of codes produced 42. The input terminal 43 is employed to input therefrom an
from the encoders 27a and 276 in the preceding stage and extracting condition for each extraction portion. In this
sends the multiplexed signal to the sender and receiver situatiori, it is aUowed to specify levels of me luminance and
section 29. The extracting circuit 24 conducts, in addition to so chroma signals as the extraction condition. A plurality of
extraction of partial images, an operation to compute for condition items are set for each extraction portion. For
each extracted portion the size, contour, and position of a example, for the portion of the mouth, a plurality of com-
reference point or coordinates of center of mass of the binations of luminance and chroma signal levels are set as a
extracted portion and then outputs the resultant data items to red portion of the lip and a white portion of teeth. Since the
the controller 26. ss lip color alters between persons, the luminance and chroma
To sense an object and to obtain features thereof, there signal levels have allowance ranges, respectively. The deci-
may be adopted, for example, a method described in the sion circuit 42 decides image areas conforming to the
JP-A-59-208983 in which features of an object is attained extracting conditions in an image received from the input
from differences between images sampled at a fixed interval terminal 40. The memory means 41 stores therein the results
of time. Alternatively, there may be utilized a method 60 of processing of the decision circuit 4, namely, extraction
described in the JP-A-4-205070 in which portions of a video image areas for each frame. Furthermore, the memory
signal satisfying a preset condition, for example, a condition means 41 stores the extraction image areas for each extrac-
determlned according to a luminance signal and a color tion portion. The results of decision are inputted again to the
difference signal is regarded as candidates of the object The decision circuit 42 to be utilized as a candidate of an
current candidate region thus extracted is compared with a 65 extraction region for the next frame. That is, the decision
region of the object obtained before a predetermined period circuit 42 produces an image region by slightly expanding
of time and stored in storage means to deterrnine an area in the previous extraction area for each extraction portion kept
07/26/2002, EAST Version: 1.03.0002
5,710,590
7 8
in the memory means 41 to make decision for each extrac- of the respective constituent components of the face. An
tion portion in the produced region. The decision circuit 42 example of the solid image display has been described, for
computes the size and position of the image for each example, in JP-A-5-27346 and JP-A-3-22753. In a flat-plane
extraction portion to deliver the results from the output or two-dimensional display, it is only necessary that the sizes
terminal 46. The data items of the size and position are s of the respective components approximately reflect those of
employed to control the imaging apparatus. According to the * e actua ' components of the object in the imaging operation
control operation, the image of the user's face can be created thereof. However, in a three-dimensional display, the posi-
in a fixed contour The positional data item of each extrac- ? ons <* * e components are required to cmrectty reflect
tion portion is delivered to the address generating circuit 44 ftose rf ^ conmonents of the send dispfcy. Fortius
to generate an address in the memory means 41, thereby to P^fP 0 * 6 ,' P 05 ^ 003 , of ^ <*» ■»* ««* fe «
storing the extraction portions in separate locations, respec- to fit ***** *» resultantly decide the sizes of
lively. HG. 4 shows an example of the results of address me elements in an automatic fashion. In this
generation in which video data of extraction portions are co/inecbon, extracting circuit 125 computes positional
combined with each other to configure one frame such as a relationships between the elements to attain the positions of
OF (Common Interchange Format) for transmission. Image is ^ eves and dmas & * e Acting operation. In the
data stored in the memory means 45 is later read therefrom P°f^°f adjustment^ the conttol circuit 105 is operated
to be delivered from the output terminal 47. according to the positions obtained by the extracting circuit
_ . . t . , , .. , . ...... 125 to adjust the imaging position and magnification ratio on
FIG. 7 shows ameUiod of rnritiple^g twj^ds of video theil ^ sl ^ tas ^T ilea ^ s ^ woa:ssisappr>
dafc .temsinch^ngdjose of toe exacted porbon (code A) priilt ^ r ^r out by between the verticaland
and non-extracted portion (code B). The muhiplexmg of * hori20ntal f ^ b £ s<ycallttleitclIQoic ZOQm _
T± T 1°*^ tJt w *? ingfunction.The image of the face is regulated on the sender
extracted portion, there is transnntted thelatest data for each side to ^tch the solid image display 90; thereafter, the
frame; whereas, fortfae non-extracted portion one image a constituent components are extracted.
transmitted in an interval of a predetermined number of , .. . . . . . .
frames. Moreover, since the exacted portion includes a 25 ^^f^ circuit 125 is configured m the fame way
partial image, when the image is decoded later by the f for mat . of &e P^^g embodiment shown in FIG. 3.
j j .l . . - . jr c . t_ / . However, in case where the extracted face includes a phi-
decoder, there is required information of the reproducing _ . \ , ; "r mv*u«wo « yxu
rw*„...«ti„ *k. a :„„u.a~ „i« i^lity of constituent elements or variable number of con-
Can f q "? f*>** A 11101114165 3150 P 05 * 00 * 1 stituent elements, it is difficult to combine the extracted
information of each extracted image. , ' ^ [ " " " \_ , 7~
6 30 elements to match the format of the transmission frame as
According to the embodiment, a greater quantity of codes described above (FIG. 4). Consequently, there will be intro-
can be allocated to such images having a larger amount of duced another layout of the memory means 45 as shown in
information as images related to the mouth and eyes. mG 8 This includes the respective constituent elements
Resultantly, the quantity of overall transmission data nec- sirnp i y ^ mcm ory blocks of the predetermined sizes. The
essary for achieving a^tisf actary quality of image can be ^ itcms outouttcd from ^ extracting circuit 125 can be
reduced or an image of a higher quality can be obtained attained by issuing a read command to the memory means
without increasing the amount of transmission data. 45, NamelVj me obtained data items constituting an image of
FIG. 6 shows an alternative embodiment according to the each extracted portion,
present invention. When compared with the preceding The encoder 127 encodes the output from the extracting
embodiment, this embodiment includes a solid image dis- w circuit 125. The encoding method or various parameters
play of the human head in place of the display of the use d in the encoding of data are determined, as described in
preceding embodiment. A reference numeral 90 stands far conjunction with the preceding embodiment, according to
the solid image display and a numeral 91 indicates an the kind and the priority levels for each extracted image,
encoding apparatus. Each of me encoded images is multiplexed by the multi-
FIG. 7 shows constitution of an encoder circuit 12 includ- 45 plexer 128 according to the priority level thereof,
ing an input terminal 120, output terminals 121, 122, an FIG. 9 shows an example of the multiplexing process,
input terminal 123, an input and output terminal 124, an Each frame includes a header code field containing, for
extraction processing circuit 125, a control circuit 126, an example, information indicating an internal format of the
encoding circuit 127, a multiplexer circuit 128, a sender and perdu cut frame and a frame identifier (ID) to identify the
receiver section 129, a separating circuit 130, a decoder 50 frame and a video data field of each extracted portion. Video
circuit 131, and a composing circuit 132. Functions of the data items of the respective constituent elements are distrib-
respective circuit blocks are the same as the corresponding uted according to the priority levels so that the related
constituent components of the preceding enibodiment The portions of a predetermined number of frames constitute
extracting circuit 125 extracts the elements of the face and information of one frame. According to the distribution
the entire face to present the human face on the display 90. 55 method of FIG. 9, only the portions of the face having a
Assume that the constituent elements to be extracted are the lower priority level are distributed into a predetermined
overall face, eyes, mouth, nose, eyebrows, etc. The elements number of frames for transmission thereof. The method of
are assigned with priority levels for the encoding operation distributing codes into a plurality of frames requires a data
thereof. For example, variation in the image of the entire buffering operation in the multiplexer 128.
face is less than that in the image of the mouth and hence the ^ The multiplexed codes are transmitted via the sender and
entire face is assigned with a lower priority leveL The eyes receiver section 129 to the network, on the receiver side, the
and mouth are equally important in this regard and accord- codes are processed primarily by the demultiplexer circuit
ingly assigned with the same priority level so as to allocate 130, decoder circuit 131, and synthesize unit 131 The
a larger quantity of codes thereto. demultiplexer circuit 121 separates codes for each priority
When it is impossible to vary the shape of solid image 65 level from the multiplexed codes from the sender side. The
display according to the face, the display 90 is subjected to separated codes are respectively decoded by the decoder 131
a model of a face having average features to fix the positions to reconstruct images of the respective constituent elements.
07/26/2002, EAST Version: 1.03.0002
5,710,590
9 10
Since the frame frequency varies between video data items tion of variation or deformation thereof (variation
received for the respective priority levels, the decoder 131 information). For the mouth image, the basic information
includes memory means for updating video data in the includes lip image data as shown in FIG. 11B and the
memory for each constituent element Hie internal memory variation information can be specified by the (jpening show-
means may be configured in the niembry format of the 5 ^ ^^^* e U PP? m * ^ ^
~~rl. J . . & . o o^oir ~->a f™™ tfe*> as shown in FIG. 11C Video data of a variation of tbe mouth
extracting circuit shown m FIG. 8. Signals read from the ~^ 7* . . .
tL « ,„~a „, .icnXfrn^ ^^ri«- can be reconstructed by modifying the basic information
memory means are used as output signals rrom tne decoder . . * . _ J %
TT'^T^ according to the variation informaUon. Similar processing
arcuit 131, also applies to the other extracted portions.
On receiving the video output from the decoder 131, the An image supplied via the input terminal 120 is fed to the
synthesize unit 132 composes an image of each constituent 10 extraction circuit 125 and undergoes an extracting operation,
element Since the positions of the respective elements fiasic information obtained as a result of extraction is stored
conform to information from the display 90, it is unneces- in toe memory means 180a and extracted images changing
sary for the sender side to transmit positional information m a ^^^5 manner are supplied to the displacement
together with the video data However, if the format for the calculating circuit 18L The basic information for mouth
display 90 is unique, the display 90 need not transmit even 15 portion can be decided in two methods. In a first method, a
the information above. point of time to acquire basic information is specified by (he
FIG. 10 shows structure of the solid image display 90 user. In a second method, basic information is obtained by
including a display unit thereof 140, a projecting unit 141 for the apparatus. In an example of the mouth, information
projecting a picture onto the display unit 140, memory ^ related to an image of the mouth in an ordinarily closed state
means 142, and an input and output (I/O) terminal 143. The assumed as basic information, whereas magnitude of
memory means 142 stores therein data related to formats of variation thereof is used as variation information. In the first
the display unit 140. The data includes data representing method, a point of time when the mouth image in the closed
three-dimensional positions and sizes of the constituent state is obtained is determined by the user. In the second
elements of the face. Since these data items are not changed method, an image of only the mouth is attained by an
if the shape of the display is not variable, there is only extracting operation. Magnitude of opening of the mouth is
required a read-only memory fixed for the display. Infer- monitored after the communication is started or during a
mation of formats is sent via the I/O terminal 143 to the fixed period of time beginning at a predetermined point of
encoding apparatus such that the encoder supplies the dis- time so as to decide a point of time when the magnitude
play with an image conforming to the display. As above, if ^ takes a minimum value. This point of time is assumed to be
information is communicated between the display and the when the mouth is closed, thereby attaining the basic infbr-
encoding apparatus, it is possible to employ a display of mation.
another type. The basic information for mouth portion attained by either
When an image communication system includes the solid one of these methods is compared with the extracted image
image display as above, there is obtained, in addition to the 35 at the specified point of time by the displacement calculating
effect of the preceding embodiment, an advantageous fea- circuit 181 to obtain information of displacement The
ture that the communication partner is reproduced in the encoding circuit 182 receives as inputs thereto the basic
vicinity of the user in a three-dimensional manner. As a information and variation or displacement information and
result, the communication can be achieved as if the partner then encodes the information. For each extracted image, the
were in front of the user of the apparatus. Moreover, ^ obtained codes respectively of basic and variation informa-
according to the embodiment, only the image of the human tion items are nwltiplexed by tbe multiplexer 128 to be
face is transmitted, namely, the background image Is not transmitted via the sender and receiver section 129.
included in the transmission data. Furthermore, the quantity fig. 13 shows the memory format of data items in the
of codes are allocated for each portion according to the memory means 180a. Hie format may be similar to that of
priority level or significance level thereof. Consequently, 45 me memory means in the extraction circuit 125.
Mg^uahty pictures can be transmitted even through a nG 14 shows mc multiplexed data format employed by
transmission path of a low transmission rate, the multiplexer 128. Basic information is trarisrnitted for
FIG. 12 shows an alternative embodiment according to each pr edetermined number of frames. A frame not contain-
the present invention. The diagram specifically shows con- m g the basic information is used to send variation informa-
stitution of the encoding apparatus in which the same ^ tion. Each of the basic and variation information items
constituent components as those of the preceding embodi- includes data items of the respective extraction portions,
meats are assigned with the same reference numerals. The mc data reception, received codes are disassembled by
apparatus of FIG. 12 includes memory means 180a, 1805, a ^ e demultiplexer circuit 130 into codes of the respective
displa ce ment evaluatar 131, an encoder circuit 182, and a extraction blocks. Moreover, the codes are classified into
decoder circuit 183. 55 those of basic information and those of displacement infer-
In mis embodiment, in addition to extract partial images mation. Each unit of separated information is decoded into
of the face, mere is conducted an operation to encode data of basic or displacement information by the decoder
information related to structure of each portion of the face. 183. The basic and displacement information items are then
As described above, the human face includes a plurality of sent to the memory means 1805 and composing circuit 132,
portions and each portion has its own structure. FIG. 11 60 respectively. The composing circuit 132 reads the basic
shows an example of structure of the human face. information from the memory means 1805 to execute an
The image of the mouth section is considered to include operation of transforming the basic information according to
the upper lip, lower lip, upper teeth, lower teeth, and interior the displacement information to reproduce each extraction
of mouth as shown in FIG. 11A. These images do not portion and then arranges the respective extraction portions
basically vary for a person during communication thereof. 65 at the pertinent positions to compose an image. The com-
Consequently, information thereof can be classified into posed image is delivered as an output image from the output
basic image information (basic information) and informa- terrninal 122.
07/26/2002, EAST Version: 1.03.0002
5,710,590
11 12
According to the embodiment described above, each tion of a constituent element changed prior to data
extraction portion is disassembled into basic information transmission, the amount of transmission data can be much
including basic image data and displacement information more decreased.
including displacement data relative to the basic information To restore the original image from the knowledge
so as to transmit the resultant codes. The basic information 5 description, images corresponding to element numbers of
including a larger number of codes is not transmitted at each the knowledge description are read from the database to be
frame. Namely, the basic information is transmitted at an combined with each other so as to compose the objective
interval of a predetermined number of frames, whereas the image. When arranging each constituent elements on the
displacement information including a lower number of screen, the position described as (0,0) for the element in the
codes is contained in each frame to be transmitted. This to knowledge description is aligned at the central position of
remarkably decreases the quantity of transmission codes. the screen. As described above, since the position indicates
Next, description will be given of a process in which an the difference between the coordinates of center of mass of
image attained by an imaging apparatus is transformed into the object and that of each element, position (0,0) stands for
knowledge description for transmission and received video the center of mass of the object With this provision, there
data including knowledge description is converted into an 13 can be achieved correction of positions so that the object
original image by reference to a database containing knowl- continuously stands at the central posldon of the screen in
edge description data. any situations.
Specifically, when an image extracted by the extracting In the direction of depth in the screen, the respective
circuit 24 is encoded by the encoder Z7b f the database 40 is images are presented with such a positional relationship that
referenced to transform the extracted image into knowledge 20 the smaller items are arranged in the upper layers. Moreover,
description. when colors of images of constituent elements such as the
Furthermore, when receiving image data in the form of ^ and iris of each eye in the database are replaced with
knowledge description, the decoder 316 accesses the data- expressed by the knowledge description, the restored
base according to the knowledge description to thereby ^ image will become more similar to the original image on the
decode the video data into the original image. In this sender side.
operation, video data items corresponding to the respective As above, the image itself is not used as the transmission
elements constituting the image transmitted from the sender data. The image of a transmission object (such as a human
side are selectively read from the database including a face) is transformed into knowledge description represent-
multiplicity of images of models associated with objects to ^ ing the image so as to send data of the knowledge descrip-
be imaged. The selected video data items are combined with tion to the communicating partner. On the receiver side, the
each other to restore the original video image. Next, descrip- original image of object is restored according to the received
tion wul be specifically given of knowledge description. For knowledge description. In consequence, the amount of trans-
methods of describing knowledge, reference is to be made, mission data is considerably rninimized and there can be
for example, to Chapter 8 (pages 132 to 139) of "Intelligent M provided a videophone system capable of producing a
Image Processing" written by Agui and Nagasaki and pub- high-quality picture in a realtime fashion even through such
lished from Shokodo in 1994. a conimuni cation line having a low transfer rate as an analog
An example of the method of transforming an image of a telephone line,
human into knowledge description will be described by Additionally, it may also be possible in the data commu-
ref erence to FIGS. 15A and 15B. FIG. 15 A shows an image 40 ni cation that important elements of the object are transmitted
of an object obtained when shooting a person by an imaging in the form of knowledge description and the other elements
apparatus. From this image, an image related to the person are transferred as video signals. In this operation, the knowl-
is extracted to be disassembled into constituent elements edge description is transmitted in a realtime manner,
such as the hairs, face, eyes, mouth, and body so as to obtain whereas image information of the overall screen is trans-
features including the coordinates of center of mass, width, 45 mitted at a low transfer speed in the range of the transfer rate
height, size, and color of each element There are also of the communication path. When transmitting, for example,
acquired such features as the width the iris of each eye, the an image of a human face, images of the eyes and mouth
width and height of the interior of mouth, and gradient important for communication are sent in a realtime fashion,
values of eyes and eyebrows. These features are transform Furthermore, when the image of the object shot by the
into data items respectively assigned with element numbers 50 imaging apparatus is extracted from the overall image of
in association with the database as shown in FIG. 16. object by the extracting circuit and the images of the
FIG. 17 shows an example of knowledge description- For re mainin g portions are replaced with one color, the trans-
each element, one set of knowledge description items is mission data can be more efficiently compressed,
specified in the form of (element number, color(r-y,b-y), However, since information of the entire screen is trans-
position(Ax^y), size). In this expression, position(Ax^y) 55 mitted at a low transmission speed in the method above, only
indicates the discrepancy between the coordinates of center the eyes and mouth are displayed on the screen as shown in
of mass of the pertinent object and that of each element As FIG. 18 immediately after the communication line is estab-
can be seen from FIG. 17, data items of knowledge descrip- lished. To overcome this difficulty, there may be prepared a
tion of constituent elements of object are described imme- model image of the human head portion in the database 1.
diatdy after a frame demarcation code. Assume that the 60 Immediately after the communication line becomes
object includes, far example, ten constituent elements and available, the eyes and mouth are composed according to the
each element such as the element number is represented by knowledge description received in a realtime manner such
an eight-bit data item. The amount of data required for each that the images of eyes and mouth are combined with the
frame resultantly becomes 480 bits. As above, the volume of model image so as to display the composed image on the
transmission data can be remarkably reduced by converting 65 screen as shown in FIG. 19 A. As can be seen from FIG. 19B,
an image into knowledge description. In addition, when the when the model image is replaced thereafter with images
system is configured to transmit only the knowledge descrip- sequentially received from the sender side, there is continu-
07/26/2002, EAST Version: 1.03.0002
5,71
13
ously displayed a natural image even Immediately after the
communication line is connected. Namely, the presented
image is gradually changed from the model image into the
human image of the sender without causing any undesirable
artificial expression, and hence the viewer can obtain a
naturally reproduced image.
As above, even when there is utilized a communication
line of a low transmission rate such as an analog telephone
line, the ftlffmwits of expression and the like of a human face
essential for communication can be transmitted in a realtime
fashion while transferring video data of the overall screen
image. This leads to an advantageous effect similar to that of
the embodiment shown in FIG. 1.
While the present invention has been described with
reference to the particular illustrative embodiments, it is not
to be restricted by those embodiments but only by the
appended claims. It is to be appreciated mat those skilled in
the art can change or modify the embodiments without
departing from the scope and spirit of the present invention.
We claim:
1. A picture communication apparatus, comp r i sing :
imaging means;
voice input means;
extracting means for extracting at least one portion of an
image of a subject from an image produced by said
imaging means;
encoding means for respectively encoding the image
portion extracted by said extracting means and a voice
inputted by said voice input means;
communicating means for coinmunicating via a commu-
nication network data obtained by encoding the image
portion and the voice by said encoding means;
decoding means for decoding data received from said
communicating means and thereby restoring the
extracted image portion and the voice;
0,590
14
synthesizing means far composing an image;
a display having a surface including depressions and
projections for displaying the image composed by said
synthesizing means;
a memory far storing information representing three-
dimensional positions and sizes of constituent elements
of the image to be displayed on the depressions and
projections of said display;
10 data input/output means for transferring said information
from said memory to said synthesizing means, whereby
said synthesizing means synthesizes the image accord-
ing to the information received from said data input/
output means and the extracted image portion decoded
15 by the decoding means to produce data representing the
synthesized image which is coordinated to the depres-
sions and projections of said display in accordance with
said three-dimensional positions and sizes; and
projection means responsive to said synthesized image
20 data for projecting a synthesized Image onto said
display.
2. A picture communication apparatus according to claim
1, wherein
the image of the extracted portion is a part of a human
25 face; and
the depressions and the projections in the surface of the
display have a general contour similar to a human face.
3. A picture communication apparatus according to claim
1, wherein the image of me extracted portion is apart of the
30 human face of a user of a picture communication apparatus;
and-
the information stored in said memory represents the
remaining non-extracted portion of the human face of
35 said user.
* ♦ * * *
07/26/2002, EAST Version: 1.03.0002